Serveur d'exploration sur la recherche en informatique en Lorraine

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

General Text Line Extraction Approach based on Locally Orientation Estimation

Identifieur interne : 003177 ( Main/Exploration ); précédent : 003176; suivant : 003178

General Text Line Extraction Approach based on Locally Orientation Estimation

Auteurs : Nazih Ouwayed [France] ; Abdel Belaïd [France] ; François Auger [France]

Source :

RBID : Pascal:10-0429721

Descripteurs français

English descriptors

Abstract

This paper presents a novel approach for the multi-oriented text line extraction from historical handwritten Arabic documents. Because of the multi-orientation of lines and their dispersion in the page, we use an image paving algorithm that can progressively and locally determine the lines. The paving algorithm is initialized with a small window and then its size is corrected by extension until enough lines and connected components were found. We use the Snake for line extraction. Once the paving is established, the orientation is determined using the Wigiier-Ville distribution on the histogram projection profile. This local orientation is then enlarged to limit the orientation in the neighborhood. Afterwards, the text lines are extracted locally in each zone basing on the follow-up of the baselines and the proximity of connected components. Finally, the connected components that overlap and touch in adjacent lines are separated. The morphology analysis of the terminal letters of Arabic words is here considered. The proposed approach has been experimented on 100 documents reaching an separation accuracy of about 98.6%.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">General Text Line Extraction Approach based on Locally Orientation Estimation</title>
<author>
<name sortKey="Ouwayed, Nazih" sort="Ouwayed, Nazih" uniqKey="Ouwayed N" first="Nazih" last="Ouwayed">Nazih Ouwayed</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>LORIA- University of Nancy 2, Campus Scientifique, B.P. 239</s1>
<s2>54506 Vandoeuvre-Lès-Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
<settlement type="city">Vandœuvre-lès-Nancy</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Belaid, Abdel" sort="Belaid, Abdel" uniqKey="Belaid A" first="Abdel" last="Belaïd">Abdel Belaïd</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>LORIA- University of Nancy 2, Campus Scientifique, B.P. 239</s1>
<s2>54506 Vandoeuvre-Lès-Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
<settlement type="city">Vandœuvre-lès-Nancy</settlement>
</placeName>
<placeName>
<settlement type="city">Nancy</settlement>
<region type="region" nuts="2">Grand Est</region>
<region type="region" nuts="2">Lorraine (région)</region>
</placeName>
<orgName type="laboratoire" n="5">Laboratoire lorrain de recherche en informatique et ses applications</orgName>
<orgName type="university">Université de Lorraine</orgName>
<orgName type="institution">Centre national de la recherche scientifique</orgName>
<orgName type="institution">Institut national de recherche en informatique et en automatique</orgName>
</affiliation>
</author>
<author>
<name sortKey="Auger, Francois" sort="Auger, Francois" uniqKey="Auger F" first="François" last="Auger">François Auger</name>
<affiliation wicri:level="3">
<inist:fA14 i1="02">
<s1>University of Nantes, IREENA, BP 406</s1>
<s2>44602 Saint-Nazaire</s2>
<s3>FRA</s3>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region" nuts="2">Pays de la Loire</region>
<settlement type="city">Saint-Nazaire</settlement>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">10-0429721</idno>
<date when="2010">2010</date>
<idno type="stanalyst">PASCAL 10-0429721 INIST</idno>
<idno type="RBID">Pascal:10-0429721</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000203</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000816</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000176</idno>
<idno type="wicri:explorRef" wicri:stream="PascalFrancis" wicri:step="Checkpoint">000176</idno>
<idno type="wicri:doubleKey">0277-786X:2010:Ouwayed N:general:text:line</idno>
<idno type="wicri:Area/Main/Merge">003239</idno>
<idno type="wicri:Area/Main/Curation">003177</idno>
<idno type="wicri:Area/Main/Exploration">003177</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">General Text Line Extraction Approach based on Locally Orientation Estimation</title>
<author>
<name sortKey="Ouwayed, Nazih" sort="Ouwayed, Nazih" uniqKey="Ouwayed N" first="Nazih" last="Ouwayed">Nazih Ouwayed</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>LORIA- University of Nancy 2, Campus Scientifique, B.P. 239</s1>
<s2>54506 Vandoeuvre-Lès-Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
<settlement type="city">Vandœuvre-lès-Nancy</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Belaid, Abdel" sort="Belaid, Abdel" uniqKey="Belaid A" first="Abdel" last="Belaïd">Abdel Belaïd</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>LORIA- University of Nancy 2, Campus Scientifique, B.P. 239</s1>
<s2>54506 Vandoeuvre-Lès-Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
<settlement type="city">Vandœuvre-lès-Nancy</settlement>
</placeName>
<placeName>
<settlement type="city">Nancy</settlement>
<region type="region" nuts="2">Grand Est</region>
<region type="region" nuts="2">Lorraine (région)</region>
</placeName>
<orgName type="laboratoire" n="5">Laboratoire lorrain de recherche en informatique et ses applications</orgName>
<orgName type="university">Université de Lorraine</orgName>
<orgName type="institution">Centre national de la recherche scientifique</orgName>
<orgName type="institution">Institut national de recherche en informatique et en automatique</orgName>
</affiliation>
</author>
<author>
<name sortKey="Auger, Francois" sort="Auger, Francois" uniqKey="Auger F" first="François" last="Auger">François Auger</name>
<affiliation wicri:level="3">
<inist:fA14 i1="02">
<s1>University of Nantes, IREENA, BP 406</s1>
<s2>44602 Saint-Nazaire</s2>
<s3>FRA</s3>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region" nuts="2">Pays de la Loire</region>
<settlement type="city">Saint-Nazaire</settlement>
</placeName>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<title level="j" type="abbreviated">Proc. SPIE Int. Soc. Opt. Eng.</title>
<idno type="ISSN">0277-786X</idno>
<imprint>
<date when="2010">2010</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<title level="j" type="abbreviated">Proc. SPIE Int. Soc. Opt. Eng.</title>
<idno type="ISSN">0277-786X</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Accuracy</term>
<term>Algorithms</term>
<term>Arabic</term>
<term>Baseline</term>
<term>Document retrieval</term>
<term>Histogram</term>
<term>Manuscript character</term>
<term>Pattern extraction</term>
<term>Pattern recognition</term>
<term>Position measurement</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Algorithme</term>
<term>Reconnaissance forme</term>
<term>Recherche documentaire</term>
<term>Extraction forme</term>
<term>Mesure position</term>
<term>Caractère manuscrit</term>
<term>Arabe</term>
<term>Histogramme</term>
<term>Ligne de base</term>
<term>Précision</term>
<term>0130C</term>
<term>4230S</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr">
<term>Recherche documentaire</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">This paper presents a novel approach for the multi-oriented text line extraction from historical handwritten Arabic documents. Because of the multi-orientation of lines and their dispersion in the page, we use an image paving algorithm that can progressively and locally determine the lines. The paving algorithm is initialized with a small window and then its size is corrected by extension until enough lines and connected components were found. We use the Snake for line extraction. Once the paving is established, the orientation is determined using the Wigiier-Ville distribution on the histogram projection profile. This local orientation is then enlarged to limit the orientation in the neighborhood. Afterwards, the text lines are extracted locally in each zone basing on the follow-up of the baselines and the proximity of connected components. Finally, the connected components that overlap and touch in adjacent lines are separated. The morphology analysis of the terminal letters of Arabic words is here considered. The proposed approach has been experimented on 100 documents reaching an separation accuracy of about 98.6%.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>France</li>
</country>
<region>
<li>Grand Est</li>
<li>Lorraine (région)</li>
<li>Pays de la Loire</li>
</region>
<settlement>
<li>Nancy</li>
<li>Saint-Nazaire</li>
<li>Vandœuvre-lès-Nancy</li>
</settlement>
<orgName>
<li>Centre national de la recherche scientifique</li>
<li>Institut national de recherche en informatique et en automatique</li>
<li>Laboratoire lorrain de recherche en informatique et ses applications</li>
<li>Université de Lorraine</li>
</orgName>
</list>
<tree>
<country name="France">
<region name="Grand Est">
<name sortKey="Ouwayed, Nazih" sort="Ouwayed, Nazih" uniqKey="Ouwayed N" first="Nazih" last="Ouwayed">Nazih Ouwayed</name>
</region>
<name sortKey="Auger, Francois" sort="Auger, Francois" uniqKey="Auger F" first="François" last="Auger">François Auger</name>
<name sortKey="Belaid, Abdel" sort="Belaid, Abdel" uniqKey="Belaid A" first="Abdel" last="Belaïd">Abdel Belaïd</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 003177 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 003177 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Lorraine
   |area=    InforLorV4
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:10-0429721
   |texte=   General Text Line Extraction Approach based on Locally Orientation Estimation
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jun 10 21:56:28 2019. Site generation: Fri Feb 25 15:29:27 2022